-
-
Notifications
You must be signed in to change notification settings - Fork 0
Feat/distributed backend #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add write quorum failure detection and error handling - Implement write acknowledgment tracking with atomic counters - Add write attempt, ack, and quorum failure metrics to distMetrics - Expose WriteQuorumFailures, WriteAcks, WriteAttempts in DistMetrics - Update README roadmap to reflect completed quorum consistency features - Add timing placeholder for potential future latency metrics This enables reliable distributed writes with configurable consistency levels and provides observability into write operation success rates and quorum enforcement.
- Add TestWriteQuorumSuccess to verify QUORUM writes succeed with majority acks - Add TestWriteQuorumFailure placeholder for ALL consistency failure scenarios - Include proper test setup with InProcessTransport and multiple nodes - Configure replication factor and write consistency levels for testing - Skip failure test until dynamic membership test harness is available These tests validate the distributed write quorum implementation and ensure proper behavior under different consistency level requirements.
- Add test helpers for distributed memory hint queue inspection and replay control - Implement complete hinted handoff integration test covering replica failure/recovery scenarios - Improve Makefile with test build tags for better test organization and optional pre-commit setup - Fix golangci-lint command typo and enhance linting with test build tags - Minor test formatting and error handling improvements This establishes robust testing infrastructure for distributed caching reliability features.
- Add DistConfig for external cluster configuration with constructor support - Implement canonical HTTP endpoints (/internal/set, /internal/get, /internal/del) - Add latency histogram collection for Set/Get/Remove operations with atomic buckets - Refactor DistTransport interfaces into separate files for modularity - Create comprehensive distributed backend roadmap with 7-phase implementation plan - Add Phase 1 integration tests with multi-node HTTP quorum validation - Include distributed operation benchmarks for performance tracking - Enhance README with consistency semantics, hinted handoff, and build tags documentation - Add test helpers for HTTP server lifecycle management under test build tag This establishes the foundation for production-grade distributed caching with configurable consistency levels, real network transport, and observable metrics. Implements data plane requirements from roadmap Phase 1. Breaking changes: None (backward compatible with existing in-process transport)
|
Running Code Quality on PRs by uploading data to Trunk will soon be removed. You can still run checks on your PRs using trunk-action - see the migration guide for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces the "Feat/distributed backend" by implementing a distributed backend with comprehensive test coverage and infrastructure improvements. The changes establish the foundation for a multi-node cluster with quorum semantics, hinted handoff, HTTP transport, and performance monitoring.
Key Changes:
- Introduces distributed memory backend with quorum read/write consistency, hinted handoff for replica failures, and HTTP-based transport layer
- Adds comprehensive test suite covering quorum operations, hint replay, and integration scenarios with proper build tag separation
- Implements latency monitoring, enhanced metrics collection, and canonical HTTP endpoints with backward compatibility
Reviewed Changes
Copilot reviewed 19 out of 21 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/integration/dist_phase1_test.go | New integration test verifying three-node quorum operations over HTTP transport |
| tests/hypercache_distmemory_write_quorum_test.go | Test coverage for write quorum success/failure scenarios with metrics validation |
| tests/hypercache_distmemory_hinted_handoff_test.go | Test coverage for hinted handoff replay mechanism when replicas are unavailable |
| tests/benchmarkdist/hypercache_dist_benchmark_test.go | Benchmark tests for distributed memory Set/Get operations with replication |
| pkg/backend/dist_transport.go | In-process transport implementation extracted from main file for better organization |
| pkg/backend/dist_memory_testhelpers.go | Test-only helper methods for controlling HTTP server and hint replay |
| pkg/backend/dist_memory.go | Enhanced with config constructor, latency monitoring, and improved metrics |
| pkg/backend/dist_latency.go | New latency histogram collector for performance monitoring |
| pkg/backend/dist_http_transport.go | Updated to use canonical endpoints while maintaining backward compatibility |
| pkg/backend/dist_http_server.go | Added canonical HTTP endpoints alongside legacy paths |
| ROADMAP.md | New comprehensive roadmap document outlining distributed backend evolution |
| README.md | Updated documentation covering consistency semantics and build tags |
| Makefile | Modified to include test build tags in testing and linting commands |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| if asserted, ok := cfg.(minimalConfig); ok { // best-effort copy | ||
| mc = asserted | ||
| } |
Copilot
AI
Aug 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type assertion to minimalConfig will always fail since cfg is of type 'any' and minimalConfig is defined locally. This suggests the function may not work as intended for external config types.
| if asserted, ok := cfg.(minimalConfig); ok { // best-effort copy | |
| mc = asserted | |
| } | |
| // Use reflection to copy matching fields from cfg to mc | |
| cfgVal := reflect.ValueOf(cfg) | |
| if cfgVal.Kind() == reflect.Ptr { | |
| cfgVal = cfgVal.Elem() | |
| } | |
| mcVal := reflect.ValueOf(&mc).Elem() | |
| cfgType := cfgVal.Type() | |
| for i := 0; i < mcVal.NumField(); i++ { | |
| field := mcVal.Type().Field(i) | |
| cfgField, ok := cfgType.FieldByName(field.Name) | |
| if !ok || !cfgField.IsExported() { | |
| continue | |
| } | |
| src := cfgVal.FieldByName(field.Name) | |
| dst := mcVal.Field(i) | |
| if src.IsValid() && src.Type().AssignableTo(dst.Type()) { | |
| dst.Set(src) | |
| } | |
| } |
| //nolint:gochecknoglobals,mnd // bucket constants intentionally centralized | ||
| var latencyBuckets = [...]int64{ | ||
| int64(50 * time.Microsecond), | ||
| int64(100 * time.Microsecond), | ||
| int64(250 * time.Microsecond), | ||
| int64(500 * time.Microsecond), | ||
| int64(1 * time.Millisecond), | ||
| int64(2 * time.Millisecond), | ||
| int64(5 * time.Millisecond), | ||
| int64(10 * time.Millisecond), | ||
| int64(25 * time.Millisecond), | ||
| int64(50 * time.Millisecond), | ||
| int64(100 * time.Millisecond), | ||
| int64(250 * time.Millisecond), | ||
| int64(500 * time.Millisecond), | ||
| int64(1 * time.Second), | ||
| } | ||
|
|
||
| // distLatencyCollector collects latency histograms (fixed buckets) for core data plane ops. | ||
| // It's intentionally lightweight: lock free, atomic per bucket. | ||
| type distLatencyCollector struct { | ||
| // buckets[op][bucket] | ||
| buckets [distOpCount][len(latencyBuckets) + 1]atomic.Uint64 // last bucket is +Inf |
Copilot
AI
Aug 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The global variable latencyBuckets should be declared as a const or moved to a package-level function to avoid potential modification and improve testability.
| //nolint:gochecknoglobals,mnd // bucket constants intentionally centralized | |
| var latencyBuckets = [...]int64{ | |
| int64(50 * time.Microsecond), | |
| int64(100 * time.Microsecond), | |
| int64(250 * time.Microsecond), | |
| int64(500 * time.Microsecond), | |
| int64(1 * time.Millisecond), | |
| int64(2 * time.Millisecond), | |
| int64(5 * time.Millisecond), | |
| int64(10 * time.Millisecond), | |
| int64(25 * time.Millisecond), | |
| int64(50 * time.Millisecond), | |
| int64(100 * time.Millisecond), | |
| int64(250 * time.Millisecond), | |
| int64(500 * time.Millisecond), | |
| int64(1 * time.Second), | |
| } | |
| // distLatencyCollector collects latency histograms (fixed buckets) for core data plane ops. | |
| // It's intentionally lightweight: lock free, atomic per bucket. | |
| type distLatencyCollector struct { | |
| // buckets[op][bucket] | |
| buckets [distOpCount][len(latencyBuckets) + 1]atomic.Uint64 // last bucket is +Inf | |
| //nolint:mnd // bucket constants intentionally centralized | |
| func latencyBuckets() [16]int64 { | |
| return [16]int64{ | |
| int64(50 * time.Microsecond), | |
| int64(100 * time.Microsecond), | |
| int64(250 * time.Microsecond), | |
| int64(500 * time.Microsecond), | |
| int64(1 * time.Millisecond), | |
| int64(2 * time.Millisecond), | |
| int64(5 * time.Millisecond), | |
| int64(10 * time.Millisecond), | |
| int64(25 * time.Millisecond), | |
| int64(50 * time.Millisecond), | |
| int64(100 * time.Millisecond), | |
| int64(250 * time.Millisecond), | |
| int64(500 * time.Millisecond), | |
| int64(1 * time.Second), | |
| } | |
| } | |
| // distLatencyCollector collects latency histograms (fixed buckets) for core data plane ops. | |
| // It's intentionally lightweight: lock free, atomic per bucket. | |
| type distLatencyCollector struct { | |
| // buckets[op][bucket] | |
| buckets [distOpCount][len(latencyBuckets()) + 1]atomic.Uint64 // last bucket is +Inf |
No description provided.